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REMARKS 

This reply is submitted in response to the Final Office Action dated November 29, 2004. 
The amendments above and remarks below address the issues raised in the Office Action, and 
thereby place the application in condition for allowance. 

Claims Objections 

Claims 53 and 56 are amended to correct the informality of the additional period at the 
end of the claims cited by the Examiner. 

Double Patenting Rejection 

The grounds for the double patenting rejection are removed by the terminal disclaimer 
filed herewith. 

Claim Rejections under 35 U.S.C. § 102 

Claims 47-57 stand rejected under 35 U.S.C. § 102(e) as being anticipated by Jaber, U.S. 
Patent No. 6,792,441. 

The Jaber patent was filed on March 10, 2001, and claims priority to provisional 
application No. 60/188,412 filed on March 10, 2000. As discussed in more detail below, the 
subject matter of the claims finds support in an application filed prior to March 10, 2000 to 
which the present application claims priority. Hence, Jaber is not prior art relative to the claimed 
invention. 

In particular, the present application is a continuation of 09/728,469, filed on November 
30, 2000. That application claims priority to two provisional applications; 60/192,639 filed on 
March 27, 2000, and 60/168,027 filed on November 30, 1999. The subject matter of the pending 
claims is fully supported by both of the provisional and, in particular, the '027 provisional filed 
November 30, 1999, and is accordingly entitled to the earlier filing date. 

A hand marked copy of the '027 provisional is attached herewith. Thus, referring to 
pages 1 and 2 of the '027 provisional, and specifically in the first paragraph on page 2, there is a 
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description of the fast Fourier transform system as disclosed in the claims of the pending 
application. By way of example, the code section beginning on page 1 1 of the hand marked 
copy of the '027 provisional (file name fft_z.c), contains code describing the calculations taking 
place in the non-final and final stages of the fast Fourier transform. The code on pages 12-22 
represents the non-final stages of calculations. The final pass begins in the middle of page 22. 
In the final pass, the first loop is represented by code on pages 22 to the top of page 26. The 
second loop is represented by code on pages 26-29. 

As evidenced by the '027 provisional, the subject matter of the pending claims is entitled 
to a priority date that precedes the sole cited reference. In view thereof, the Applicants request 
that it be removed as a reference. 

In view of the above amendment, applicant believes the pending application is in 
condition for allowance. Reconsideration and allowance are respectfully requested. 





b v : : 

David J. Powsner 

Registration No.: 31,868 

NUTTER MCCLENNEN & FISH LLP 

World Trade Center West 

155 Seaport Boulevard 

Boston, Massachusetts 02210-2604 

(617)439-2000 

(617)310-9000 (Fax) 

Attorney for Applicant 



6 



Hand Marked Copy of 60/168,017 



IMPROVED METHODS AND APPARATUS FOR FAST FOURIER TRANSFORM 

The invention provides improved methods and apparatus for fast fourier transform. 

From the user's perspective, the code performs an in-place "split-complex" ID FFT (forward or 
inverse) for power of 2 sizes ranging from 16 to 4096, inclusive. 

There are 3 user-callable functions: fft_setupO, fft_ z 0 and fft_free(): 

void fft_setup ( unsigned long LOG2N, FFT_setup * SETUP ); 

void fft_z ( float *Creal, float *Cimag, unsigned long LOG2N, FFT_setup *SETUP ); 

void fft_free ( FFT_setup *SETUP ); 

FFT_setup is a structure defined as follows: 

typedef struct { 
float *twidp; /* pointer to 16-byte aligned 

malloc'ed twiddle buffer */ 

unsigned char *bitrp; /* pointer to static bit-reversal 
table */ 
}FFT_setup; 

A user first calls ffi_setupO specifying a particular FFT size (actually, the base 2 log of the size) 
along with a pointer to an uninitialized FFTjsetup structure. This function allocates (malloc) and 
builds the appropriate "twiddle" table and places a pointer to this table and the appropriate bit- 
reversal table (a static table) in the FFT_setup structure supplied by the caller. 

Next, ffi_zO can be called repeatedly for the same size FFT as was specified in the 
corresponding call to ffi_setupO- The user must also specify the same FFT_setup structure that 
was filled in by that call. The input/output vectors are supplied in a split-complex format with 
the real parts contiguous in the first float vector argument (Creal) and the corresponding 
imaginary parts contiguous in the second float vector argument (Cimag). The call performs a 
forward FFT. To perform an inverse FFT, simply interchange the real and imaginary vectors 
(i.e., specify tjie imaginary vector in the first argument and the real vector in the second 
argument). y 

Finally, the user calls fft_free() to free the twiddle buffer previously allocated and constructed by 
fft_setup(). The user must specify the same FFT_setup structure to both calls. 

Here is a one line description of what is in each file: 

fft.h: user's header file 

fft_bitr: contains static bit-reversal tables for all 9 FFT sizes (16 - 4096) 
fft_setup.c source for fft_setupO and fft_free() 
fft_z.c source for fft_z() 

ppc_vmx.h: macro header file for VMX (altivec) emulation of SIMD instructions. 
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ppc_vmx.c: contains C functions that emulate VMX (altivec) SIMD instructions 

Note that fft_jz() is implemented using macros that emulate VMX SIMD instructions. There is a 
structure (VMX_reg) defined in ppc_vmx.h that emulates a 16-byte VMX SIMD register. The 
floating point variables used in fft_z() are of this type. fift_z.c does *not* contain an optimized 
PPC G4 implementation of fft_z() insofar as the instructions are *not* ordered in an optimal way 
for that processor. However, the primary patent claim is clearly demonstrated in the final pass of 
the FFT which begins on line 661 of fft_z.c. This section performs the final radix-4 in-place pass 
of the FFT but manages to leave the results correctly ordered in the real and imaginary 
input/output vectors. This can be accomplished with 32 or fewer 16-byte "registers" (i.e., 512 or 
fewer bytes of temporary storage). 
.» 

It will be appreciated that the teachings hereof may be applied using different programming 
languages, toolsets, operating systems, platforms and otherwise. 



/ 

/ 



I* File Name: fft.h *| 

I* Description: Header file for FFT functions *| 

I* *| 

I* Mercury Computer Systems, Inc. *| 

I* Copyright (c) 1999 All rights reserved *| 

I* *| 

I* Revision Date Engineer; Reason *| 



* i 



I* 0.0 991119 jg; Created - *| 

\**********************************^ 

* FFT setup structure 

* contains pointers to twiddles and bit-reversed indices 

* pointers are filled in by fft — setup () function 
*/ 

typedef struct { 

float *twidp; 

unsigned char *bitrp; 
} FFT_setup; 

/*' 

* FFT function prototypes 
*/ 

void f ft_f ree ( FFT_setup *SETUP ) ; 

void fft_setup( unsigned long LOG2N, FFT_setup *SETUP ); 

void fft_z( float *Cr, float *Ci, unsigned long LOG2N, FFT_setup *SETUP ); 



/ 



3 



/★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★it* 

I* File Name: fftjaitr.c *| 

I* Description: Special bit-reversed tables for FFT sizes *| 

I * 4 <= L0G2N <= 12 * I 

I * *| 

I* Let: L0G2M = L0G2N - 4 *| 

I * M — 2 A L0G2M • * | 

I* *| 

I * For each table : * I 

I* *| 

I* section 1: *| 

I* nl = bitr[0] = # of elements in section 1 *| 

I* (The first and second elements are not in the table *| 

I* as they, are known to be 0 and M-l, respectively.) *| 

'I*.. . • ' *l 

I* 0, M-l, bitr[l], bitrtnl-2] - *| 

I * indices that bit-reverse to themselves * I 

I * *| 

I * .. section 2 : * | 

I* n2 = bitr[nl-l] = # of elements in section 2 *| 

I* It's always true that nl + n2 = M. *| 

I* (The first element is not in the table and, if *| 

I* n2 != 0, is known to be 1.) *| 

I* *| 

I* (1, bitrfnl]), (bitr[nl+l], bitr[nl+2]), *| 

I* (bitr[M-3], bitr[M-2]) = n2/2 pairs of indices that *| 

I* bit-reverse to each other. bitr[M-l] =0. *| 

I* ■ *l 

I* Mercury Computer Systems, Inc. *| 

I* Copyright (c) 1996 All rights reserved *| 

I* *| 

I* Revision Date Engineer; Reason *| 

I* + , 

I * 0.0 990716 jg; Created *| 

* Table for M = 1 (N = 16) . 
*/ 

unsigned char _f ft_bitr_l [ ] = { 
1, / 
0, 0, 0 / 

}; 

/* " 

* Table for M = 2 (N = 32) . 
*/ 

unsigned char _f f t_bitr_2 [ ] = { 
2, 

0, 0, 0 

}; 
/* 

* Table for M = 4 (N = 64) . 
*/ 

unsigned char fft bitr 4[] = { 



4 



2, 

2, 2, 0 

}; 



/* 

* Table for M = 8 (N = 128) . 
*/ 

unsigned char _f f t_bitr_8 [ ] = { 
4, 2, 5, 
4, 4, 3, 6, 0 

}; 
/* 

* Table for M - 16 (N = 256) . 

'*/ , 

unsigned char _f f t_bitr_16 [ } = { 
\, 6, 9, 

12, 8, 2, 4, 3, 12, 5, 10, 7, 14, 11, 13, 0 

>; 
/* 

* Table for M = 32 (N = 512) . 
*/ 

unsigned char _f ft_bitr_32 [ ] = { 

8, 4, 10, 14, 17, 21, 27, 

24, 16, 2, 8, 3, 24, 5, 20, 6, 12, 7, 28, 

9, 18, 11, 26, 13, 22, 15, 30, 19, 25, 23, 29, 0 

}; 
/* 

* Table for M = 64 (N = 1024). 
V 

unsigned char _f f t_bitr_64 [ ] = { 
8, 12, 18, 30, 33, 45, 51, 
56, 32, 2, 16, 3, 48, 4, 8, 5, 40, 6, 24, 
7, 56, 9, 36, 10, 20, 11, 52, 13, 44, 14, 28, 
15, 60, 17, 34, 19, 50, 21, 42, 22, 26, 23, 58, 

25, 38, 27, 54, 29, 46, 31, 62, 35, 49, 37, 41, 
39, 57, 43, 53, 47, 61, 55, 59, 0 

}; 



* Table f/r M = 128 (N = 2048). 
*/ 

unsigned char _f f t_bitr_128 [ ] = { 

1(5, 8, 20, 28, 34, 42, 54, 62, 65, 73, 85, 93, 99, 107, 119, 

112, 64, 2, 32, 3, 96, 4, 16, 5, 80, 6, 48, 7, 112, 9, 72, 

10, 40, 11, 104, 12, 24, 13, 88, 14, 56, 15, 120, 17, 68, 18, 36, 

19, 100, 21, 84, 22, 52, 23, 116, 25, 76, 26, 44, 27, 108, 29, 92, 

30, 60, 31, 124, 33, 66, 35, 98, 37, 82, 38, 50, 39, 114, 41, 74, 

43, 106, 45, 90, 46, 58, 47, 122, 49, 70, 51, 102, 53, 86, 55, 118," 

57, 78, 59, 110, 61, 94, 63, 126, 67, 97, 69, 81, 71, 113, 75, 105, 

77, 89, 79, 121, 83, 101, 87, 117, 91, 109, 95, 125, 103, 115, 111, 123, 0 

}; 

/* 

* Table for M = 256 (N = 4096) . 



s 



*/ 

unsigned char _f f t_bitr_256 [ ] = { 

16, 24, 36, 60, 66, 90, 102, 126, 129, 153, 165, 189, 195, 219, 231, 
240, 128, 2, 64, 3, 192, 4, 32, 5, 160, 6, 96, 7, 224, 8, 16, 
9, 144, 10, 80, 11, 208, 12, 48, 13, 176, 14, 112, 15, 240, 17, 136, 
18, 72, 19, 200, 20, 40, 21, 163, 22, 104, 23, 232, 25, 152, 26, 88, 
27, 216, 28, 56, 29, 184, 30, 120, 31, 248, 33, 132, 34, 68, 35, 196, 
37, 164, 38, 100, 39, 228, 41, 148, 42, 84, 43, 212, 44, 52, 45, 180, 
46, 116, 47, 244, 49, 140, 50, 76, 51, 204, 53, 172, 54, 108, 55, 236, 
57, 156, 58, 92, 59, 220, 61, 188, 62, 124, 63, 252, 65, 130, 67, 194, 
69, 162, 70, 98, 71, 226, 73, 146, 74, 82, 75, 210, 77, 178, 78, 114, 
79, 242, 81, 138, 83, 202, 85, 170, 86, 106, 87, 234, 89, 154, 91, 218, 
93, 186, 94, 122, 95, 250, 97, 134, 99, 198, 101, 166, 103, 230, 105, 150, 
. 107, 214, 109, 182, 110, 118, 111, 246, 113, 142, 115, 206, 117, 174, 119, 

238, 

121, 158,. 123/ 222, 125, 190, 127, 254, 131, 193, 133, 161, 135, 225, 137/ 
145, 

139, 209, 141, 177, 143, 241, 147, 201, 149, 169, 151, 233, 155, 217, 157, 
185, " " 

159, 249, 163, 197, 167, 229, 171, 213, 173, 181, 175, 245, 179, 205, 183, 
237, 

187, 221, 191, 253, 199, 227, 203, 211, 207, 243, 215, 235, 223, 251, 239, 
247, 0 
}; 
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I* File Name: fft_setup.c *l 

I* Description: Setup for fft_z (split complex in-place FFT) *| 

I* Entry/params: void fft_setup ( ulong L0G2N, *l 

|* ~ FFT_setup *SETUP ) *| 

I* Entry/params: void fft_free ( FFT_setup *SETUP ) *| 

I* " ~ *l 

I* Formula: *l 

I* *| 

I* L0G2N is the log (base 2) of the FFT size. *| 

I* (4 <= L0G2N <= 12) *| 

I* *l 

I* Let: N = 2 A L0G2N *| 

f * L0G2M, = L0G2N -4 * I 

| * M « i A L0G2M . * I 

I* A = 2 * PI / N *| 

I* BITR ( i, m ) = bit-reversal of unsigned integer i *| 

I * over m bits * I 

I * *l 

I* void fft_setup ( ulong LOG2N, FFT_setup *SETUP ) *| 

I* *l 

I* SETUP->twidp is set to an allocated buffer that is *| 

I* 16-byte aligned and contains M sets of 4 x 4 floating *| 

I* point twiddles arranged exactly as follows: *| 

I* *l 

I* cos(kA), cos((k+l)A), cos((k+2)A), cos((k+3)A), *| 

I* sin(kA), sin((k+l)A), sin((k+2)A), sin((k+3)A), *| 

I* cos(2kA), cos (2 (k+1) A) , cos ( 2 ( k+2 ) A) , cos ( 2 ( k+3 ) A) , - *| 

I* sin(2kA), sin(2 (k+l)A) , sin (2 (k+2) A) , sin(2(k+3)A) *| 

I* *l 

I* for k « 0 *l 

I* *l 

I* cos(kA), cos ((k+1) A), cos ((k+2) A), cos ((k+3) A), *| 

I* tan(kA), tan((k+l)A), tan((k+2)A), tan((k+3)A) / *| 

I* cot(2kA), cot (2(k+l)A) , cot (2 (k+2 ) A) , cot (2 (k+3) A) , *| 

I* sin(2kA), sin (2 ( k+1 ) A) , sin (2 (k+2) A) , sin(2(k+3)A) *| 

I* *l 

I* for k = 4 * BITR ( 1, L0G2M ), *| 

I * 4 * BITR ( 2, L0G2M ) , * I 

I* *l 

I* " 4 * BITR ( M-2, L0G2M ) *l 

I * / * I 

I* cos(kA), cos((k+l)A), cos((k+2)A) / cos((k+3)A), *| 

1* sin(kA), sin((k+l)A), sin((k+2)A), sin((k+3)A), *| 

I* cos(2kA) / cos (2 (k+l)A) , cos ( 2 ( k+2 ) A) , cos ( 2 ( k+3 ) A) , *| 

I* sin(2kA) / sin(2 (k+l)A) , sin ( 2 ( k+2 ) A) , sin(2(k+3)A) *| 

I * *l 

I* for k = 4 * (M - 1) *l 

I* *l 

I* SETUP->bitrp is set to static table of M unsigned char *| 

I* bit-reversed index values (L0G2M bits) arranged *| 

I* as follows: *l 

I* *l 

I * section 1 : * I 

I* nl = bitrp[0] = # of elements in section 1 *| 

I * (The first and second elements are not in the table * I 
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I* as they are known to be 0 and M-l, respectively.) *| 

I* *l 

I* 0, M-l, bitrp[l], . .., bitrp[nl-2] = *| 

I * indices that bit-reverse to themselves * I 

I* *l 

I * section 2: * I 

I* n2 = bitrp[nl-l] = # of elements in section 2 *| 

I* It ! s always true that nl + n2 = M. . *| 

I* (The first element is not in the table and, if *| 

I* n2 != 0, is known to be 1.) *| 

I* *l 

I* (1, bitrp[nl]), (bitrp [nl+1] , bitrp [nl+2] ) , . .., *| 

I* (bitrp [M-3], bitrp [M-2] ) - n2/2 pairs of indices that *| 

I* bit-reverse to each other, bitrp [M-l] =0. *| 

I* / *l 

I* void fft_free'( FFT_setup *SETUP )* ' *| 

I* *l 

I * frees SETUP->twidp and sets SETUP->twidp and * | 

I* SETUP->bitrp to 0 *| 

I* *l 

I* Mercury Computer Systems, Inc. *| 

I* Copyright (c) 1999 All rights reserved *| 

I* * *l 

I* Revision Date Engineer; Reason *| 

I* *| 

I* 0.0 991119 jg; Created *| 

\**************************^ 

#include <malloc.h> 
# include <math. h> 
#include "fft.h" 
♦include "ppc__vmx . h" 

#define TWOPI (double ) 6 . 283185307 17958 647 692528 68 
tdefine BITR ( log2x, index, bitr_index ) \ 

{ \ 

ulong _bitr_i, _bitr_x; \ 
_bitr_x = (index) ; \ 
bitr_index =0; \ 

for ( _bitr_i = 0; _bitr_i < (log2x) ; _bitr_i++ ) { \ 
bitr_index «= 1; \ 
bitr_index |= (_bitr__x & 1); \ 
_biltr_x »- 1; \ 

} \ 

} 

extern uchar _f f t_bitr_l [ ] ; 
extern uchar _f f t_bitr_2 [ ] ; 
extern uchar _f f t_bitr_4 [ ] ; 
extern uchar _f ft_bitr_8 [ ] ; 
extern uchar _f f t_bitr_16 [ ] ; 
extern uchar _f f t_bitr_32 [ ] ; 
extern uchar _f f t_bitr_64 [ ] ; 
extern uchar _f f t_bitr_128 [ ] ; 
extern uchar _f f t_bitr_256 [ ] ; 

void fft_setup( ulong LOG2N, FFT_setup *SETUP ) 
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char **mallocp; 
char *buffer; 
float *twidp; 

ulong bitr_i, i, j, log2n_m4, n, nvl6; 

double angle, cosl, cos2, delta, incr, sinl, sin2, twopivn; 
n = 1 « L0G2N; 



buffer - malloc( (n * sizeof ( float ) ) + 20 ) ; 
if ( Ibuffer ) { 

SETUP->twidp = (float + )0; 

return; 

} / • . ... 

twidp = (float *)( (ulong) (buffer + 20) & -15); 
mallocp = (char **) (twidp - 1); 
*mallocp = buffer; 



nvl6 - n » 4; 
log2n_m4 = LOG2N - 4; 
twopivn = TWOPI / (double) n; 
delta = (double) 0. 0; 



for ( i = 0; i < nvl6; i + + ) { 
for ( j = 0; j < 4; j++ ) { 
incr = delta; 
angle = twopivn * incr;. 
cosl = cos (angle); 
sinl = sin (angle); 
incr += delta; 
angle = twopivn * incr; 
cos2 = cos (angle); 
sin2 = sin(angle); 



if ( ( i =- 0 ) || ( i == (nvl6 - 1) ) ) { 
twidp [(i « 4) + j] = (float) cosl; 
twidp[(i « 4) + j + 4] = (float) sinl; 
twidp [(i « 4) + j + 8] = ( float )cos2; 
twidp [(i « 4) + j + 12] - ( float )sin2; 

» / 
else { 

BITR ( log2n_m4, i, bitr_i ) 

twidp [ (bitr_i « 4) + j ] = ( float) cosl; 

twidp [ (bitr_i « 4) + j + 4] = (float) (sinl / cosl) 

twidp [ (bitr_i « 4) + j + 8] = (float) (cos2 / sin2) 

twidp [ (bitr_i « 4) + j + 12] = (float) sin2; 

} 

delta += (double) 1.0; 

} 

} 



SETUP->twidp = twidp; 
if ( L0G2N == 4 ) 

SETUP->bitrp = _fft_bitr_l; 
else if ( L0G2N == 5 ) 
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{ J 



SETUP->bitrp 




fft_ 


_bitr_ 


2; 


else if ( L0G2N 




~6 ) 






SETUP->bitrp 





fft 


bitr 


4; 


else if I L0G2N 




~7 ) ~ 






SETUP->bitro 




fft 


bitr 


8; 


else if ( L0G2N 




~8 ) ~ 






SETUP->bitrp 




fft 


bitr. 


16; 


else if ( LOG2N 










SETUP->bitrp 




fft_ 


bitr_ 


32; 


else if ( LOG2N 




10 ) 






SETUP->bitrp 




fft 


bitr_ 


_64; 


else if ( L0G2N 




"ll ) 






SETUP->bitrp 




fft 


bitr_ 


128; 


else if ( L0G2N 




"12 ') 






SETUP->bitr,p 




fft 


bitr_ 


256; 


return; 











void fft_free( FFT_setup 'SETUP ) 
{ 

char **mallocp; 



(SETUP- 


■>bitrp 




fft 


bitr_ 


1) 1 1 


(SETUP- 


->bitrp 




fft" 


~bitr_ 


2) 1 1 


(SETUP- 


->bitrp 




fft" 


~bitr_ 


4) 1 I 


(SETUP- 


->bitrp 




fft" 


"bitr" 


8) i 1 


(SETUP- 


->bitrp 




fft" 


"bitr] 


16) | | 


(SETUP- 


->bitrp 




fft" 


"bitr" 


32) | | 


(SETUP- 


->bitrp 




fft* 


bitr" 


64) | | 


(SETUP- 


->bitrp 




fft" 


"bitr] 


128) || 


(SETUP- 


->bitrp 




fft" 


"bitr" 


256) ) { 



mallocp = (char **) (SETUP->twidp - 1) ; 



free ( *mallocp ) ; 

} 

SETUP->twidp = (float *)0; 
SETUP->bitrp = (uchar *)0; 
return; 



\o 



i i 



I* File Name: fft_z.c *l 

I* Description: Forward (or Inverse) Complex In-place ID FFT *| 

|* Entry/params: void fft_z ( float *Cr, float *Ci, +| 

|* J ulong L0G2N, FFT^setup *SETUP ) *| 

| * . * I 

I* Formula: *l 

|* *l 

I* Cr/Ci = 2 /N LOG2N-point (4 <= L0G2N <= 12) forward in-place *| 

I* complex Id FFT of the split complex vector stored *| 

I* in Cr and Ci . *l 

| * *l 

| * (Note, an inverse FFT can be performed by swapping *| 

I* Cr and Qi.) *l 

|* * *l 

I * where : * I 

I* *l 

I* Cr and Ci must be 16-byte aligned and have unit stride *| 

I* stride between adjacent real (Cr) and imaginary (Ci) *| 

I * points . * I 

I* ^ *l 

I* L0G2N is the log (base 2) of the FFT size. *| 

|* (4 <= L0G2N <= 12) *l 

|* *l 

I * Let: N = 2 A L0G2N *l 

I* L0G2M = L0G2N - 4 *l 

|* M = 2 A L0G2M *l 

I* A - 2 * PI / N " *l. 

I* BITR ( i, m ) = bit-reversal of unsigned integer i *| 

1 * over m bits * I 

|* *l 

I* SETUP->twidp is a 16-byte aligned pointer to M sets *| 

I* of 4 x 4 floating point twiddles arranged exactly *| 

I* as follows: *l 

|* *l 

I* cos(kA), cos((k+l)A), cos((k+2)A), cos((k+3)A), *| 

I* sin(kA), sin((k+l)A), sin((k+2)A), sin((k+3)A), *| 

|* cos(2kA), cos(2(k+l)A) , cos (2 (k+2 ) A) , cos (2 ( k+3 ) A) , *| 

I* sin(2kA), sin(2 (k+l)A) , sin (2 ( k+2 ) A) , sin(2(k+3)A) *| 

|* *l 

I* for k - 0 *l 

I * ./ * I 

I* cos(kA), cos((k+l)A), cos((k+2)A), cos((k+3)A), *| 

I * tan(kA), tan((k+l)A), tan((k+2)A), tan((k+3)A), *| 

I* cot(2kA), cot(2(k+l)A), cot (2 (k+2) A) , cot (2 (k+3) A) f *| 

I* sin(2kA), sin (2 (k+1) A) , sin ( 2 ( k+2 ) A) , sin(2(k+3)A) *| 

|* *l 

I* for k = 4 * BITR ( 1, L0G2M ), *l 

| * 4 * BITR ( 2, L0G2M ) , *| 

I * *l 

I* 4 * BITR ( M-2, L0G2M ) *l 

| * *| 

I* cos(kA), cos((k+l)A), cos((k+2)A), cos((k+3)A), *| 

I* sin(kA), sin((k+l)A), sin((k+2)A), sin((k+3)A), *| 

I* cos(2kA), cos (2 (k+l)A) , cos (2 ( k+2 ) A) , cos (2 ( k+3) A) , *| 

I* sirt(2kA), sin(2(k+l)A), sin (2 ( k+2 ) A) , sin(2(k+3)A) *| 



*l 



* for k = 4 * (M - 1) *l 

*l 

* SETUP->bitrp is a pointer to M unsigned char * I 

* bit-reversed index values (L0G2M bits) arranged *| 

* as follows: *l 

*l 

* section 1: *l 

* nl = bitrp[0] = # of elements in section 1 *| 

* (The first and second elements are not in the table *| 

* as they are known to be 0 and M-l, respectively.) *| 

*l 

* 0, M-l, bitrpfl], . .., bitrp[nl-2] = *| 

* indices that bit-reverse to themselves *| 

* section 2: *I 

* n2 = bitrp[nl-l] = # of elements in section 2 *| 

* It's always true that nl + n2 = M. *| 
*. (The first element is not in the table and, if *| 

* n2 != 0, is known to be 1.) 



* - (1, bitrp[nl]), (bitrp [nl + 1] , bitrp [nl+2] ) , . .., *| 

* (bitrp [M-3], bitrp [M-2] ) = n2/2 pairs of indices that *| 

* bit-reverse to each other, bitrp [M-l] =0. *| 

*l 

* Mercury Computer Systems, Inc. *| 

* Copyright (c) 1999 All rights reserved *| 

*l 

* Revision Date Engineer; Reason ■ *| 
*' ' — *l 

* 0.0 991119 jg; Created *| 
\**************************^^ 

#include "fft.h" 

# include "ppc_vmx . h" 



/* 

* _ fft _ z 
*/ 



void fft_z ( float *Cr, float *Ci, ulong L0G2N, FFT_setup *SETUP ) 

float *Crl, *Cil, *Cr2, *Ci2, *Cr3, *Ci3; 

float *Cr4, *Ci4, *Cr5, *Ci5, *Cr6, *Ci6, *Cr7, *Ci7; 

float *wp0, *wpl, *wp2, *wp3; 

unsigned .char *bitrp; 

ulong index, index_bump, indexl, index2, windex; 
ulong bflycnt, bflyoff, gent, sent, N; 



VMX_reg aOr, aOi, air, ali, a2r, a2i, a3r, a3i; 

VMX_reg yOr, y0i> ylr, yli, y2r, y2i, y3r, y3i; 

VMX_reg tlr, tli, t2r, t2i, m2r, m2i, m3r, m3i; 

VMX_reg pOr, pOi, plr, pli, p2r, p2i, p3r, p3i; 

VMX_reg xlr, xli, x2r, x2i; 

VMX_reg cosl, sinl, cos2, sin2, tanl, cot2; 
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VMX_ 


reg 


aOr_8, 


aOi_8, 


alr_8, 


ali_8, 


a2r_ 


8, 


a2i_ 


8, 


a3r 


8, 


a3i 


8 


VMX" 


jceq 


a4r_8, 


a4i_8, 


a5r_8, 


a5i_8, 


a6r 


8, 


a6i 


8, 


a7r~ 


8, 


a7i" 


"8 


VMX~ 


reg 


y0r_8, 


yOi 8, 


ylr 8, 


yli 8, 


y2r^ 


8, 


y2i_ 


8, 


y3r 


8, 


y3i; 


"8 


VMX~ 


reg 


y4r 8, 


y4i 8, 


y5r_8, 


y5i 8, 


y6r 


8, 


y6i 


8, 


y7r 


8, 


y7i_ 


"8 


VMX* 


reg 


tlr 8, 


tli_8, 


t2r_8, 


t2i 8, 


t3r~ 


8, 


t3i 


8, 


t4r~ 


8, 


t4i 


"8 


VMX~ 


reg 


t5r 8, 


t5i 8, 


t6r 8, 


t6i 8, 


t7r~ 


8, 


t7i _ 


8, 


t8r~ 


8, 


t8i~ 


"8 


VMX" 


_reg 


dlr_8, 


dli_8, 


d2r_8, 


d2i_8,- 


m2r[ 


8, 


m2i 


8, 


m5r_ 


8, 


m5i_ 


*8 


VMX~ 


_reg 


slr_8, 


sli_8, 


s2r_8, 


s2i_8, 


s3r_ 


8, 


s3i_ 


8, 


s4r_ 


8, 


s4i_ 


"8 


VMX~ 
/* 


_reg 


em4r_8 


, em4i_ 


8, em7r_ 


_8, em7i_8, 


rad2v2; 













* here if N >= 16 
V 

wpO = SETUP->twidp; 
wpl = wpO .+ 4,y 
wp2 = wpO + 8; 
wp3 = wpO + 12; 
bitrp = SETUP->bitrp; 
N = 1 « LOG2N; 

if ( LOG2N & 1 ) { 

/* radix-8 first pass */ 

windex = 64; 

LVEWX( rad2v2, wpO, windex ) 

bflyoff = N » 1; 

VSPLTW( rad2v2, rad2v2, 0 ) 



/* cos (PI/4) = sqrt(2)/2 */ 
/* 4 * N/8 - N/2 byte offset */ 
/* replicate 4 times */ 



Crl 




(float 


*) 


\ (char 


*)Cr + bflyoff) ; 


Gil 




(float 


*) 


; (char 


*)Ci + bflyoff) ; 


Cr2 




(float 


*) 


[ (char 


*)Crl 


+ 


bflyoff) 


Ci2 




(float 


*) 


( (char 


*)Cil 


+ 


bflyoff) 


Cr3 




(float 


*) 


( (char 


*)Cr2 




bflyoff) 


Ci3 




(float 


*) 


; (char 


*)Ci2 


+ 


bflyoff) 


Cr4 




(float 


*) 


[ (char 


*)Cr3 




bflyoff) 


Ci4 




(float 


*) 


[ (char 


*)Ci3 


+ 


bflyoff) 


Cr5 




(.float 


*) 


( (char 


*)Cr4 


+ 


bflyoff) 


Ci5 




(float 


*) 


( (char 


*)Ci4 


+ 


bflyoff) 


Cr6 




(float 


*) 


[ (char 


*)Cr5 




bflyoff) 


Ci6 




(float 


*) 


; (char 


*)Ci5 


+ 


bflyoff) 


Cr7 




/ float 


*) 


[ (char 


*)Cr6 




bflyoff) 


Ci7 




/(float 


*) 


; (char 


*)Ci6 


+ 


bflyoff) 


index 


= 0; 













bf lycnt 
while ( 
LVX( 
LVX{ 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 



= bflyoff; 
bflycnt ) { 
a0r_8, Cr, 
a0i_8, Ci, 
alr_8, 
ali_8, 
a2r_8, 
a2i_8, 
a3r_8, Cr3, 
a3i_8, Ci3, 
a4r 8, Cr4, 



/* while ( index < bflyoff ) { */ 



Crl, 
Cil, 
Cr2, 
Ci2, 



index ) 
index ) 
index 
index 
index 
index 
index 
index 
index 
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f J 



LVX( a4i_8, Ci4, index ) 

LVX( a5r_8, Cr5, index ) 

LVX( a5i_8, Ci5/ index ) 

LVX( a6r_8, Cr6, index ) 

LVX( a6i_8, Ci6, index ) 

LVX( a7r_8, Cr7, index ) 

LVX( a7i__8', Ci7, index ) 



VADDFP ( 


tlr 


8, 


aOr 


8, 


a4r_ 


8 


VSUBFP ( 


dlr 


8, 


aOr 


8, 


a4r 


8 


VADDFP ( 


tli" 


8, 


aOi 


~8, 


a4i" 


"8 


VSUBFP ( 


dli" 


8, 


aOi 


"8, 


a4i" 


8 


VADDFP ( 


t3r_ 


8, 


air 


8, 


a5r_ 


8 


VSUBFP ( 


,t4r~ 


8, 


a5r" 


8, 


air" 


8 


VADDFP ( 


t3i 


8 , 


ali" 


"8, 


a5i" 


~8 


VSUBFP ( 


t4i~ 


8, 


ali" 


"8, 


a5i" 


~8 


VADDFP ( 


t2r 


8, 


a2r_ 


8, 


a6r 


8 


VSUBFP ( 


d2r 


8, 


a 6r 


8/ 


a2r" 


8 


VADDFP ( 


t2i 


~8, 


a2i" 


"8, 


a6i 


~8 


VSUBFP ( 


d2i 


~8, 


a2i" 


"8, 


a6i 


8 


VADDFP ( 


t5r 


8, 


a3r 


8, 


a7r 


8 


VSUBFP ( 


t6r 


8, 


a7r" 


"8, 


a3r~ 


"8 


VADDFP ( 


t5i 


"8/ 


a3i" 


8, 


a7i" 


8 


VSUBFP ( 


t6i 


"8/ 


a3i 


8, 


a7i" 


~8 


VADDFP ( 


t7r_ 


8, 


tlr 


8, 


t2r 


8 


VSUBFP ( 


m2r" 


8, 


tlr 


8, 


t2r 


"8 


VADDFP ( 


t7i_ 


8, 


tli 


~8, 


t2i 


~8 


VSUBFP ( 


m2i_ 


8, 


tli 


"8, 


t2i~ 


~8 


VADDFP ( 


t8r 


8, 


t5r 


8, 


t3r 


8 


VADDFP ( 


t8i_ 


8, 


t3i 


"8, 


t5i 


"8 


VSUBFP ( 


m5r_ 


8, 


t3i~ 


"8, 


t5i" 


"8 


VSUBFP ( 


m5i_ 




t5r~ 


8, 


t3r I 


^8 


VADDFP ( 


yOr_ 


8, 


t7r 


8, 


t8r 


8 


VADDFP ( 


yOi_ 


8, 


t7i_ 


"8, 


t8i_ 


"8 


VADDFP ( 


y2r 


8, 


m2r_ 


8, 


m5r_ 


"8 


VAD/)FP( y2i_ 
/ 


.8/ 


m2i_ 


8, 


m5i_ 


8 


VSUBFP ( 


y4r_ 


8, 


t7r 


8, 


t8r 


8 


VSUBFP ( 


y4i_ 


8, 


t7i _ 


8, 


t8i_ 


*8 


VSUBFP ( 


y6r_ 


8, 


m2r_ 


"8, 


m5r_ 


8 


VSUBFP ( 


y6i_ 


8, 


m2i_ 


8, 


m5i_ 


"8 



VSUBFP ( em4r_8, t6r_8, t4r_8 ) 

VSUBFP ( em4i_8, t4i_8, t6i_8 ) 

VADDFP ( em7r_8, t4i_8, t6i_8 ) 

VADDFP ( em7i_8, t6r_8, t4r_8 ) 

VMADDFP ( slr_8, rad2v2, em4r_8, dlr_8 ) 
VMADDFP ( sli_8, rad2v2, em4i_8, dli_8 ) 
VNMSUBFP( s2r_8, rad2v2, em4r_8, dlr_8 ) 
VNMSUBFP( s2i_8, rad2v2, em4i_8, dli_8 ) 



I i 



i ) 



VMADDFP ( s3r_8, rad2v2, em7r_8, d2i_8 ) 
VMADDFP ( s3i_8, rad2v2, em7i_8, d2r_8 ) 
VNMSUBFP( s4r_8, rad2v2, em7r_8, d2i_8 ) 
VNMSUBFP( s4i_8, rad2v2, em7i_8, d2r_8 ) 



VADDFP ( ylr_8, slr_8, s3r_8 

VADDFP ( yli_8, sli_8, s3i_8 

VSUBFP( y3r_8, s2r_8, s4r_8 

VSUBFP( y3i_8, s2i_8, s4i_8 

VADDFP ( y5r_8, s2r_8, s4r_8 

VADDFP { y5i_8, s2i_8, s4i_8 

VSUBFP( y7r_8, slr_8, s3r_8 

VSUBFP( ,y7i_8, sli_8, s3i_8 



STVX{ 


yOr_ 


8, 


Cr, 


index ) 


STVX( 


yOi_ 


8, 


Ci, 


index ) 


STVX( 


y2r 


8, 


Cr2, 


index 


STVX( 


y2i~ 


8, 


Ci2, 


index 


STVX( 


y4r_ 


8, 


Crl, 


index 


STVX( 


y4i_ 


8, 


Cil, 


index 


STVX( 


y6r_ 


8, 


Cr3, 


index 


STVX( 


y6i_ 


8, 


Ci3, 


index 


STVX( 


ylr 


8, 


Cr4, 


index 


STVX( 


yli" 


8, 


Ci4 ; 


index 


STVX( 


y3r 


8, 


Cr6, 


index 


STVX( 


y3il 


8, 


Ci6, 


index 


STVX( 


y5r 


8, 


Cr5, 


. index 


STVX( 


y5il 


8, 


Ci5, 


index 


STVX( 


ylr 


8, 


Cr7, 


index 


STVX( 


y7i~ 


8, 


Ci7, 


index 



/* bit-reverse output +/ 



index += 16; 
bflycnt -= 16; 

} 

} 

else { 

bflyoff. = N; 
f 

Crl (float *)((char *)Cr + bflyoff); 

Cil = (float *)((char *)Ci + bflyoff); 

Cr2 = (float *)((char *)Crl + bflyoff); 

Ci2 = (float *)((char *)Cil + bflyoff); 

Cr3 = (float *)((char *)Cr2 + bflyoff); 

Ci3 = (float *)((char *)Ci2 + bflyoff); 

index = 0; 



/* end radix-8 first pass */ 
/* radix-4 first pass */ 
/* 4 * N/4 = N byte offset */ 



bflycnt = bflyoff; 

while ( bflycnt ) { /* while ( index < bflyoff ) { */ 

LVX( aOr, Cr, index ) 
LVX( aOi, Ci, index ) 
LVX( air, Crl, index ) 
LVX( ali, Cil, index ) 
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LVX( a2r, Cr2, index ) 

LVX( a2i, Ci2, index ) 

LVX( a3r, Cr3, index ) 

LVX( a3i, Ci3, index ) 

VADDFP ( tlr, 
VADDFP ( tli, 
VSUBFP( m2r, 
VSUBFP ( m2i, 



VADDFP ( 
VADDFP ( 
VSUBFP ( 
VSUBFP( 

VADDFP ( 
VADDFP ( 
VADDFP ( 
VADDFP ( 

VSUBFP ( 
VSUBFP{ 
VSUBFP ( 
VSUBFP ( 



t2r, 
t2i, 
m3r, 
m3i, 

y 

/ 

yOr, 
yOi, 
yir, 
yii, 

y2r, 
y2i, 
y3r, 
y3i, 



aOr, 


a2r 


aOi, 


a2i 


aOr, 


a2r 


aOi, 


a2i 


a3r, 


air 


ali, 


a3i 


ali, 


a3i 


a3r, 


air 


tlr,- 


t*2r 


tli, 


t2i 


m2r, 


m3r 


m2i, 


m3i 


tlr, 


t2r 


tli, 


t2i 


m2r, 


m3r 


m2i, 


m3i 



STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 



yOr, 
yOi, 
yir, 

yli, 
y2r, 
y2i, 
y3r, 

y3i, 



Cr, 

Ci, 

Cr2, 

Ci2, 

Crl, 

Cil, 

Cr3, 

Ci3, 



index ) 
index ) 
index 
index 
index 
index 
index 
index 



index += 16; 
bflycnt -= 16; 



while ( bflyoff > 64 ) { 

index ^bump = bflyoff; 
bflyoff »= 2; 



/* bit-reverse output */ 



/* end radix-4 first pass */ 
/* middle stages */ 



/* decimate by 4 */ 



index_ 


_bump -= 


= bflyoff; 




/* 3 * 


bflyoff */ 


Crl - 


(float 


*) ( (char 


*)Cr 


+ bflyoff) ; 


/* adjust 


Cil = 


(float 


*) ( (char 


*)Ci 


+ bflyoff) ; 




Cr2 = 


(float 


*) ( (char 


*)Crl 


+ bflyoff) ; 




Ci2 = 


(float 


*) ( (char 


*)Cil 


+ bflyoff); 




Cr3 = 


(float 


*) ( (char 


*)Cr2 


+ bflyoff) ; 




Ci3 = 


(float 


*) ( (char 


*)Ci2 


+ bflyoff) ; 




index 


= 0; 











pointers 



bflycnt = bflyoff; 
while ( bflycnt ) { 

LVX( aOr, Cr, index ) 



/* first (weightless) group 



I i 



LVX( aOi, 

LVX( air, 

LVX( ali, 

LVX( a2r, 

LVX( a2i, 

LVX( a3r, 

LVX{ a3i, 



Ci, 

Crl, 

Cil, 

Cr2, 

Ci2, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



VADDFP ( 


tlr, 


aOr, 


a2r ) 


VHUUC c \ 




dUl/ 


a^ x j 


UCflPlTD ( 


m^r , 


aur , 


azr j 


VoUbt r \ 


m2i , 


aui , 




VADDFP ( 


t2r, 


a3r, 


air ) 


VADDFP ( 


,t2i, 


ali, 


a3i ) 


VSUBFP ( 


m3r, 


ali,- 


a3i ) 


VSUBFP ( 


11131, 


a3r, 


air ) 


VADDFP ( 


yOr, 


tlr, 


t2r ) 


VADDFP ( 


yOi, 


tli, 


t2i ) 


VADDFP ( 


ylr, 


m2r, 


m3r ) 


VADDFP ( 


yli, 


m2i. 


m3i ) 


VSUBFP ( 


y2r, 


tlr, 


t2r ) 


VSUBFP ( 


y2i, 


tli, 


t2i ) 


VSUBFP ( 


y3r, 


m2r, 


m3r ) 


VSUBFP ( 


y3i, 


ni2i, 


m3i ) 



STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 



yOr, 
yOi, 
ylr, 
yli, 
y2r, 
y2i, 
y3r, 
y3i, 



Cr, 

Ci, 

Cr2, 

Ci2, 

Crl, 

Cil, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



/* bit-reverse output */ 



index += 16; 
bflycnt — 16; 



/* end of first (weightless) group */ 



winded = 64; 

./ 

gent = N - bflyoff; 
while ( gent ) { 



/* loop for remaining groups */ 



* load weights for group 
*/ 

LVEWX( cosl, wpO, windex ) 
LVEWX( tanl, wpl, windex ) 
wp2, windex ) 
wp3, windex ) 
cosl, 0 ) 
tanl, 0 ) 



cosl, 
tanl, 
LVEWX( cot 2, 
LVEWX( sin2, 
VSPLTW( cosl, 
VSPLTW( tanl, 



VSPLTW( 
VSPLTW( 



cot2, 
sin2, 



cot2, 
sin2, 



/* replicate 4 times V 
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index += index bump; 



bflycnt = bflyoff; 
while ( bflycnt ) { 



LVX( aOr, 
LVX( aOi, 
LVX( air, 
LVX( ali, 
LVX( a2r, 
LVX( a2i, 
LVX( a3r, 
LVX( a3i, 
/ 

VMADDFP ( 
VNMSUBFP ( 
VMADDFP ( 
VNMSUBFP ( 



Cr, 

Ci, 

Crl, 

Cil, 

Cr2, 

Ci2, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



xlr, cot2, a2r, a2i ) 

xli, cot2, a2i, a2r ) 
x2r, cot2, a3r, a3i ) 
x2i, cot2, a3i, a3r ) 



VMADDFP { tlr, sin2, xlr, aOr ) 
VNMSUBFP ( tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP { t2i, sin2, x2i, ali ) 

VNMSUBFP ( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFP ( m3r, sin2, x2r, air ) 
. VMADDFP ( m3i, sin2, x2i, ali.) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFP ( xli, tanl, t2r, t2i ) 
VNMSUBFP ( x2r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP ( yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFP ( yli, cosl, x2i, m2i ) 

VNMSUBFP ( y2r, cosl, xlr, tlr ) 
/VNMSUBFP ( y2i, cosl, xli, tli ) 
y VNMSUBFP( y3r, cosl, x2r, ra2r ) 
VMADDFP ( y3i, cosl, x2i, m2i ) 



STVX( yOr, Cr, index ) 

STVX( yOi, Ci, index ) 

STVX( ylr, Cr2, index ) 

STVX( yli, Ci2, index ) 

STVX( y2r, Crl, index ) 

STVX( y2i, Cil, index ) 

STVX( y3r, Cr3, index ) 

STVX( y3i, Ci3, index ) 



/* bit-reverse output */ 



index += 16; 
bflycnt -= 16; 



/* end of butterfly loop */ 
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windex += 64; 
gent -= bflyoff; 



if ( bflyoff == 64 ) { 



/* bump weight index */ 

/* end of group loop */ 
/* end of stage loop */ 



Crl 
Cil 
Cr2 
Ci2- 
Cr3 
Ci3 



(float 
(float 
(float 
(float 
(float 
(float 



*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 
*) ( (char 



*)Cr 
*)Ci 



+ 16); 
+ 16); 
*)Crl + 16) 
*)Cil + 16) 
*)Cr2 + 16) 
*)Ci2 + 16) 



/* penultimate stage *> 
/* adjust pointers 



index = 0; 



/* 



/* same as windex */ 



* first group (4 butterflies) is weightless 
*/ 

LVX( aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVX( ali, Cil, index ) 

LVX( a2r, Cr2, index ) 

LVX{ a2i, Ci2, index ) 

LVX{ a3r, Cr3, index ) 

LVX( a3i, Ci3, index ) 



VADDFP ( 


tlr, 


aOr, 


a2r 


VADDFP ( 


tli, 


aOi, 


a2i 


VSUBFP ( 


m2r, 


aOr, 


a2r 


VSUBFP( 


m2i, 


aOi, 


a2i 


VADDFP ( 


t2r, 


a3r, 


air 


VADDFP ( 


t2i, 


ali, 


a3i 


VSUBFP ( 


m3r, 


ali, 


a3i 


VSUBFP ( 


m3i, 


a3r, 


air 


VADDFP ( 


yOr, 


tlr, 


t2r 


VADDFP ( 


yOi, 


tli, 


t2i 


VADDFP (• 


yir, 


m2r, 


m3r 


VADDFP/f 


yli, 


m2i, 


m3i 


VSUBFP ( 


y2r, 


tlr, 


t2r 


VSUBFP ( 


y2i, 


tli, 


t2i 


VSUBFP { 


y3r, 


m2r , 


m3r 


VSUBFP { 


y3i/ 


m2i, 


m3i 



STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX (. 



yOr, 
yOi, 
ylr, 
yli, 
y2r, 
y2i, 
y3r, 
y3i, 



Cr, 

Ci, 

Cr2, 

Ci2, 

Crl, 

Cil, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



/* bit-reverse output */ 
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/* 

* • loop for remaining butterflies except the very last 
*/ 

bflycnt = N - 32; 
while ( bflycnt ) { 

index += 64; 



/* 

* load weights for group 
V 

LVEWX( cosl, wpO, index ) 
LVEWX( tanl, wpl, index ) 
LVEWX( cot2, wp2, index ) 
LVEWX( s£n2, wp3, index ) 
VSPLTW( cosl, cosl, 0 ) 
VSPLTW( tanl, tanl, 0 ) 
VSPLTW( cot2, cot2, 0 ) 
VSPLTW( sin2, sin2, 0 ) 

LVX( aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVX( ali, Cil, index ) 

LVX( a2r, Cr2, index ) 

LVX( a2i, Ci2, index ) 

LVX( a3r, Cr3, index ) 

LVX( a3i, Ci3, index ) 



/* replicate 4 times */ 



VMADDFP ( xlr, cot2, a2r, a2i ) 
VNMSUBFP ( xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VNMSUBFP ( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP ( tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP ( t2i, sin2, x2i, ali ) 

VNMSUBFP ( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFP ( m3r, sin2, x2r, air ) 
VMADDFP { m3i, sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFP ( xli, tanl, t2r, t2i ) 
VNMSUBFP { x2r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP ( yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFP ( yli, cosl, x2i, m2i ) 

VNMSUBFP ( y2r, cosl, xlr, tlr ) 
VNMSUBFP ( y2i, cosl, xli, tli ) 
VNMSUBFP ( y3r, cosl, x2r, m2r ) 



VMADDFP ( y3i, cosl, x2i, m2i ) 



STVX{ 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 
STVX( 



yOr, 
yOi, 
ylr, 
yli, 
: y2r, 
y2i, 
y3r, 
y3i, 



Cr, 

Ci, 

Cr2, 

Ci2, 

Crl, 

Cil, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index ) 



/* bit-reverse output */ 



bflycnt -= 16; 



/* end of butterfly loop */ 



* very last butterfly uses cosine/sine weights for accuracy 
*/ 

index += 64 ; ■ • 



LVEWX( cosl, wpO, index ) 
LVEWX( sinl, wpl, index ) 
LVEWX( cos2, wp2, index ) 
LVEWX( sin2, wp3, index ) 
VSPLTW( cosl, cosl, 0 ) 
VSPLTW( sinl, sinl, 0 ) 
VSPLTW( cos2, cos2, 0 ) 
VSPLTW( sin2, sin2, 0 ) 



/* replicate 4 times */ 



LVX( air, Crl, index ) 

LVX( ali, Cil, index ) 

LVX( a2r, Cr2, index ) 

LVX( a2i, Ci2, index ) 

LVX( a3r, Cr3, index ) 

LVX( a3i, Ci3, index ) 

LVX( aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

VMADDFP ( tlr, cos2, a2r, aOr ) 
VMADDFP { tli, cos2, a2i, aOi ) 
VNMSUBFP( m2r, cos2, a2r, aOr ) 
VNMSUBFP ( m2i, cos2, a2i, aOi ) 

VMADDFP ( tlr, sin2, a2i, tlr ) 
VNMSUBFP ( tli, sin2, a2r, tli. ) 
VNMSUBFP ( m2r, sin2, a2i, m2r ) 
VMADDFP ( m2i, sin2, a2r, m2i ) 

VMADDFP ( t2r, cos2, a3r, air ) 
VMADDFP ( t2i, cos2, a3i, ali ) 
VNMSUBFP ( m3r, cos2, a3r, air ) 
VNMSUBFP ( 11131, cos2, a3i, ali ) 

VMADDFP ( t2r, sin2, a3i, t2r ) 
VNMSUBFP ( t2i, sin2, a3r, t2i ) 
VNMSUBFP ( m3r, sin2, a3i, m3r ) 
VMADDFP ( m3i, sin2, a3r, m3i ) 



f 



VMADDFP ( yOr, cosl, t2r, tlr ) 
VMADDFP ( yOi, cosl, t2i, tli ) 
VNMSUBFP( y2r, cosl, t2r, tlr ) 
VNMSUBFP( y2i, cosl, t2i, tli ) 

VMADDFP ( yOr, sinl, t2i / yOr ) 
VNMSUBFP( yOi, sinl, t2r, yOi ) 
VNMSUBFP( y2r, sinl, t2i, y2r ) 
VMADDFP ( y2i, sinl, t2r, y2i ) 

VNMSUBFP( ylr, sinl, m3r, m2r ) 
VNMSUBFP( yli, sinl, m3i, m2i ) 
VMADDFP ( y3r, sinl, m3r, m2r ) 
VMADDFP ( y3i, sinl, m3i, m2i ) 

/ 

VMADDFP ( ylr, cosl, "m3i, ylr )' 
VNMSUBFP( yli, cosl, m3r, yli ) 
VNMSUBFP( y3r, cosl, m3i, y3r ) 
VMADDFP ( y3i, cosl, m3r, y3i ) 

STVX( yOr, Cr, index ) 

STVX( yOi, Ci, index ) 

STVX( ylr, Cr2, index ) 

STVX( yli, Ci2, index ) 

STVX( y2r, Crl, index ) 

STVX( y2i, Cil, index ) 

STVX( y3r, Cr3, index ) 

STVX{ y3i, Ci3, index ) 



/* bit-reverse output */ 



/* end penultimate pass */ 



/* 

* final pass 
*/ 

Crl - (float *)((char *)Cr + N) ; /* adjust pointers */ 

Cil = (float *)((char *)Ci + N) ; 

Cr2 = (float *)((char *)Crl + N) ; 

Ci2 = (float *)(<char *)Cil + N) ; 

Cr3 - (float *)((char *)Cr2 + N) ; 

Ci3 = (float *)((char *)Ci2 + N) ; 

bflycnt = (ulong) *bitrp; 
windex = >0 ; 
index = "6; 

sent = (bflycnt == 1) ? 1 : 2; 
bflycnt -= sent; 

/* 

* loop for in-place butterflies using cosine/sine weights (at most 2) 
*/ 

while ( sent ) { 

LVX( aOr, Cr, index ) 

LVX( aOi, Ci, index ) 

LVX( air, Crl, index ) 

LVX( ali, Cil, index ) 

LVX( a2r, Cr2, index ) 
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LVX( a2i, Ci2, index ) 
LVX( a3r, Cr3, index ) 
LVX( a3i, Ci3/ index ) 



LVX( cosl, wpO, windex ) 

LVX( sinl, wpl, windex ) 

LVX( cos2, wp2, windex ) 

LVX{ sin2, wp3, windex ) 

/* 

* perform two (real and imaginary) 4x4 permutes 

* but swapping the resulting 2 middle columns 
*/ 



VMRGHW ( 


pOr, 


aOr, 


air 


VMRGHW ( 


pOi/, 


aOi, 


ali 


VMRGHW ( 


plr, 


a2r, 


a3r 


VMRGHW ( 


pli, 


a2i, 


a3i 


VMRGLW ( 


p2r, 


aOr, 


air 


VMRGLW ( 


p2i, 


aOi, 


ali 


VMRGLW { 


p3r, 


a2r, 


a3r 


VMRGLW ( 


p3i, 


a2i, 


a3i 


VMRGHW ( 


aOr, 


pOr, 


plr 


VMRGHW ( 


aOi, 


pOi, 


pli 


VMRGLW ( 


air, 


pOr, 


plr 


VMRGLW ( 


ali, 


pOi, 


pli 


VMRGHW ( 


a2r, 


p2r, 


p3r 


VMRGHW ( 


a2i, 


p2i, 


p3i 


VMRGLW ( 


a3r, 


p2r, 


p3r 


VMRGLW ( 


a3i, 


p2i, 


p3i 



VMADDFP ( tlr, cos2, a2r, aOr ) 
VMADDFP ( tli, cos2, a2i, aOi ) 
VNMSUBFP( m2r, cos2, a2r, aOr 
VNMSUBFP( m2i, cos2, a2i, aOi 

VMADDFP { tlr, sin2, a2i, tlr ) 
VNMSUBFP( tli, sin2, a2r, tli 
VNMSUBFP ( m2r, sin2, a2i, m2r 
VMADDFP { m2i, sin2, a2r, m2i ) 

VMADDFP ( t2r, cos2, a3r, air ) 
VMADDFP ( t2i, cos2, a3i, ali ) 
VNMSUBFP( m3r, cos2, a3r, air 
VNMSUBFP ( m3i, cos2, a3i, ali 

VMADDFP ( t2r, sin2 / a3i, t2r ) 
VNMSUBFP ( t2i, sin2, a3r, t2i 
VNMSUBFP ( m3r, sin2, a3i, m3r 
VMADDFP ( m3i, sin2, a3r, m3i ) 

VMADDFP ( yOr, cosl, t2r, tlr ) 
VMADDFP ( yOi, cosl, t2i, tli ) 
VNMSUBFP ( y2r, cosl, t2r, tlr 
VNMSUBFP ( y2i, cosl, t2i, tli 



A3 



{ 



VMADDFP ( yOr, sinl, t2i, yOr ) 
VNMSUBFP( yOi, sinl, t2r, yOi ) 
VNMSUBFP( y2r, sinl, t2i, y2r ) 
VMADDFP ( y2i, sinl, t2r, y2i ) 

VNMSUBFP( ylr, sinl, m3r, m2r ) 
VNMSUBFP( yli, sinl, m3i, m2i ) 
VMADDFP ( y3r, sinl, m3r, m2r ) 
VMADDFP ( y3i, sinl, m3i, m2i ) 

VMADDFP ( ylr, cosl, m3i, ylr ) 
VNMSUBFP( yli, cosl, m3r, yli ) 
VNMSUBFP ( y3r, cosl, m3i, y3r ) 
VMADDFP ( yM, cosl, m3r, y3i ) 



STVX( 


yOr, 


Cr, 


index ) 


STVX( 


yOi, 


Ci, 


index ) 


STVX( 


ylr, 


Crl, 


index 


STVX( 


yli/ 


Cil, 


index 


STVX( 


y2r, 


Cr2, 


index 


STVX( 


y2i, 


Ci2, 


index 


STVX( 


y3r, 


Cr3, 


index 


STVX( 


y3i, 


Ci3, 


index 


index 


= N 


- 16; 




windex = index 


« 2; 


sent • 


— 1; 







/* no bit-reversal 



} 



/* end butterfly loop */ 



index = (ulong) *++bitrp; 
windex = index << 6; 
index «= 4; 



* loop for remaining in-place butterflies (uses tan, cot weights 
*/ 

while ( bflycnt ) { 



LVX ( 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 
LVX( 



aOr, 
aOi, 
a^r, 

a2r, 
a2i, 
a3r, 
a3i, 



Cr, 

Ci, 

Crl, 

Cil, 

Cr2, 

Ci2, 

Cr3, 

Ci3, 



index ) 
index ) 
index ) 
index ) 
index ) 
index ) 
index j 
index ) 



LVX( cosl, wpO, windex ) 

LVX ( tanl, wpl, windex ) 

LVX( cot2, wp2, windex ) 

LVX{ sin2, wp3, windex ) 

/* 

* perform two (real and imaginary) 4x4 permutes 

* but swapping the resulting 2 middle columns 
*/ 



VMRftHW ( 




QUI ^ 


air 


VMRGHW ( 
v i ii\on.vK \ 


pux , 


Cl vJ X , 


al ■? 
all 


VMRGHW ( 


plr, 


a2r, 


a3r 


VMRGHW ( 


pli, 


a2i, 


a3i 


VMRGLW ( 

V LllWjUM \ 


n2r 


aOr , 


air 


VMRGLW ( 


n2"i 
p^x , 


aOi 

Cl V X , 


ali 

Cl X X 


VMRGLW ( 


p3r, 


a2r, 


a3r 


VMRGLW ( 


p3i> 


a2i, 


a3i 


VMRGHW ( 
vrJr\on.w \ 


dUI , 


pur, 


pir 


VMRGHW I 
v L JAuni^v \ 


aU 1 / 


pvx ,. 


pil 


VMRGLW ( 


air 

ail ^ 


pux / 


nlr 
pxx 


VMRGLW ( 


ali, 


pOi, 


pli 


VMRGHW ( 


/ 

a2r, 


p2r, 


p3-r 


VMRGHW ( 


a2i, 


p2i, 


p3i 


VMRGLW ( 


a3r, 


p2r, 


p3r 


VMRGLW ( 


a3i, 


p2i, 


P3i 



VMADDFP ( xlr, cot2 / a2r, a2i ) 
VNMSUBFP ( xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VNMSUBFP ( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP ( tli, .sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP { t2i, sin2, x2i, ali ) 

VNMSUBFP ( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFP ( m3r, sin2, x2r, air ) 
VMADDFP ( m3i, sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFP ( xli, tanl, t2r, t2i ) 
VNMSUBFP ( x2r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 



VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDF^t yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFP ( yli, cosl, x2i, m2i ) 

VNMSUBFP ( y2r, cosl, xlr, tlr ) 
VNMSUBFP ( y2i, cosl, xli, tli ) 
VNMSUBFP ( y3r, cosl, x2r, m2r ) 
VMADDFP ( y3i, cosl, x2i, m2i ) 

STVX{ yOr, Cr, index ) 

STVX( yOi, Ci, index ) 

STVX( ylr, Crl, index ) 

STVX( yli, Cil, index ) 

STVX( y2r, Cr2, index ) 

STVX{ y2i, Ci2, index ) 

STVX( y3r, Cr3, index ) 



/* no bit-reversal 



£6 



STVX( y3i, Ci3, index ) 

index = (ulong j *++bitrp; 
bflycnt — 1; 
windex = index « 6; 
index «- 4 ; 

} /* end butterfly loop */ 



* loop for out-of-place butterflies 
*/ 

bflycnt ■= index » 4; /* count of bit-reverse indices */ 

windex = 64; 
indexl = 16; 



while ( 


bflycnt ) 


{ 




LVX ( 


cosl, 


wpO, 


windex 


) 


LVX( 


tanl, 


wpl, 


windex 


) 


LVX( 


cot2, 


wp2, 


windex 


) 


LVX( 


sin2, 


wp3, 


windex 


) 


LVX( 


aOr, 


Cr, indexl ) 




LVX ( 


aOi, 


Ci, indexl ) 




LVX ( 


air, 


Crl, 


indexl ) 




LVX( 


ali, 


Cil, 


indexl ) 




LVX( 


a2r, 


Cr2, 


indexl ) 




LVX( 


a2i, 


Ci2, 


indexl ) 




LVX ( 


a3r, 


Cr3, 


indexl ) 




LVX ( 


a3i, 


Ci3, 


indexl ) 





/* 

* perform two (real and imaginary) 4x4 permutes 

* but swapping the resulting 2 middle columns 
*/ 



VMRGHW ( 


pOr, 


aOr, 


air 


VMRGHW ( 


pOi, 


aOi, 


ali 


VMRGHW ( 


plr, 


a2r, 


a3r 


VMRGHW ( 


pli/ 


a2i, 


a3i 


VMRGLW ( 


p2r, 


aOr, 


air 


VMRGLW ( 


p2i, 


aOi, 


ali 


VMRGLW p3r, 


a2r, 


a3r 


VMRGLW ( 


p3i, 


a2i, 


a3i 


VMRGHW ( 


aOr, 


pOr, 


plr 


VMRGHW ( 


aOi, 


pOi, 


pli 


VMRGLW ( 


air , 


pOr, 


plr 


VMRGLW ( 


ali, 


pOi, 


pli 


VMRGHW ( 


a2r, 


p2r, 


p3r 


VMRGHW ( 


a2i, 


p2i, 


p3i 


VMRGLW ( 


a3r, 


p2r, 


p3r 


VMRGLW ( 


a3i, 


p2i, 


P 3i 



VMADDFP ( xlr, cot2, a2r, a2i ) 
VNMSUBFP( xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
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VNMSUBFP( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP ( tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP( t2i, sin2, x2i, ali ) 

VNMSUBFP ( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFP( m3r, sin2, x2r, air ) 
VMADDFP ( m3i, sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFP ( xli, tanl, t2r, t2i ) 
VNMSUBFP( 3*2 r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP ( yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFP ( yli, cosl, x2i, m2i ) 

VNMSUBFP( y2r, cosl, xlr, tlr ) 
VNMSUBFP ( y2i, cosl, xli, tli ) 
VNMSUBFP ( y3r, cosl, x2r, m2r ) 
VMADDFP ( y3i, cosl, x2i, m2i ) 

index2 = (ulong) *++bitrp; 
windex = index2 « 6; 
index2 <<= 4; 

LVX( cosl, wpO, windex ) 

LVX( tanl, wpl, windex ) 

LVX( cot2, wp2, windex ) 

LVX( sin2, wp3, windex ) 

LVX( aOr, Cr, index2 ) 
LVX( aOi, Ci, index2 ) 
LVX( air, Crl, index2 ) 
LVX( ali, Cil, index2 ) 
LVX( a2r, Cr2, index2 ) 
LVX( a£i, Ci2, index2 ) 
LVX( £3r, Cr3, index2 ) 
LVX( a3i, Ci3, index2 ) 

STVX( yOr, Cr, index2 ) /* no bit-reversal 

STVX( yOi, Ci, index2 ) 

STVX( ylr, Crl, index2 ) 

STVX( yli, Cil, index2 ) 

STVX( y2r, Cr2, index2 ) 

STVX( y2i, Ci2, index2 ) 

STVX( y3r, Cr3, index2 ) 

STVX( y3i, Ci3, index2 ) 

/* 

* perform two (real and imaginary) 4x4 permutes 

* but swapping the resulting 2 middle columns 



£7 



,-*- 

( ) 



*/ 



VMKljriW { 


pOr, 


. aur, 


air 


VMRGHW ( 


pOi, 


aOi, 


ali 


VMRGHW ( 


plr, 


a2r, 


a3r 


T 7TUID /"* LITaT / 


pli, 






VjyiKijijW [ 


p^r, 


n V 


air 

air 


VMRGLW ( 


p2i, 


aOi, 


ali 


VMRGLW ( 


p3r, 


a2r, 


a3r 




pji, 


dZl , 


a Jl 






pur, 


pir 






pUl , 


pn 


VMRGLW ( 


air, 


pOr, 


plr 


VMRGLW ( 


ali; 


pOi, 


pli 


VMRGHW ( 


a2r, 


p2r, 


p3r 


VMRGHW ( 


a2i, 


p2i, 


p3i 


VMRGLW ( 


a3r, 


p2r, 


P 3r 


VMRGLW ( 


a3i, 


p2i, 


P3i 



VMADDFP ( xlr, cot2, a2r, a2i ) 
VNMSUBFP( xli, cot2, a2i, a2r ) 
VMADDFP ( x2r, cot2, a3r, a3i ) 
VNMSUBFP( x2i, cot2, a3i, a3r ) 

VMADDFP ( tlr, sin2, xlr, aOr ) 
VNMSUBFP( tli, sin2, xli, aOi ) 
VMADDFP ( t2r, sin2, x2r, air ) 
VNMSUBFP( t2i, sin2, x2i, ali ) 

VNMSUBFP( m2r, sin2, xlr, aOr ) 
VMADDFP ( m2i, sin2, xli, aOi ) 
VNMSUBFP( m3r, sin2, x2r, air ) 
VMADDFP ( m3i, sin2, x2i, ali ) 

VMADDFP ( xlr, tanl, t2i, t2r ) 
VNMSUBFP( xli, tanl, t2r, t2i ) 
VNMSUBFP( x2r, tanl, m3r, m3i ) 
VMADDFP ( x2i, tanl, m3i, m3r ) 

VMADDFP ( yOr, cosl, xlr, tlr ) 
VMADDFP ( yOi, cosl, xli, tli ) 
VMADDFP ( ylr, cosl, x2r, m2r ) 
VNMSUBFP( yli, cosl, x2i, m2i ) 

VNMSUBFP ( y2r, cosl, xlr, tlr ) 
VNMSUBFP( y2i, cosl, xli, tli ) 
VNMSUBFP ( y3r, cosl, x2r, m2r ) 
VMADDFP ( y3i, cosl, x2i, m2i ) 

STVX( yOr, Cr, indexl ) /* no bit-reversal 

STVX( yOi, Ci, indexl ) 

STVX( ylr, Crl, indexl ) 

STVX( yli, Cil, indexl ) 

STVX( y2r, Cr2, indexl ) 

STVX( y2i, Ci2, indexl ) 



as 



STVX( y3r, Cr3, indexl ) 
STVX( y3i, Ci3, indexl ) 



indexl = (ulong) *++bitrp; 
windex = indexl << 6; 
indexl «= 4; 



bflycnt -= 2; 



/* end butterfly loop 



/ 



I * File Name : ppc_vmx . c * I 

I* Description: Contains C functions that emulate PPC vmx *| 

I* (altivec) instructions *l 

|* *l 

I* Mercury Computer Systems, Inc. *l 

I* Copyright (c) 1999 All rights reserved *| 

| * ^ * I 

I* Revision Date Engineer; Reason *| 

I* *| 

I* 0.0 991119 jg; Created *| 

#include r, ppc_vmx> h" 

long CR[ 8 ]; /* condition register V 

void _lvewx( VMX_reg *vT, ulong rA, ulong rB ) 
{ 

ulong *addr; 
ulong i; 

addr = (ulong *) ( (rA) + (rB) ) ; 
i = ((ulong) addr & Oxc) >> 2; 
(vT)->ul[i] = *addr; 

} 

void _lvx( VMX_reg *vT, ulong rA, ulong rB ) 

{ 

ulong *addr; 
ulong i; 

addr = (ulong *)({(rA) + (rB) ) & -15); 
for ( i = 0; i < 4; i++ ) 
(vT) ->ul [i] = addr [i] ; 

} 

void _stvewx( VMX_reg *vS, ulong rA, ulong rB ) 
{ 

ulong *addr; 
ulong i; 

addr = (ulong *)((rA) + (rB) ) ; 
i = ( (ulo/ig) addr & Oxc) » 2; 
*addr = >(vS) ->ul [i] ; 

} 

void _stvx( VMX_reg *vS, ulong rA, ulong rB ) 
{ 

ulong *addr; 
ulong i; 

addr = (ulong *)(((rA) + (rB) ) & -15); 
for ( i = 0; i < 4; i++ ) 
addr[i] = (vS) ->ul [i] ; 

} 

void __vaddfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

ulong i; 
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t 



for ( i ■ 0; i < 4; i++ ) 

(vT)->f[i] - (vA)->f[i] + (vB)->f[i]; 

} 

void _vmaddfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vC, VMX_reg *vB ) 
{ 

ulong i; 

for ( i = 0; i < 4; ) 

(vT)->f[i] - ((vA)->f[i] * (vC)->f[i]) + (vB)->f[i]; 

} 

void _vmrghw( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

... VMX__reg v; 
ulong i, j ; / 
for ( i = 0; i < 2; ) { 

j = i + i; 

v.ul[j] = (vA)->ul[i]; 
v.ul[(j+l)] = (vB)->ul[i]; 

} 

for ( i = 0; i < 4; i++ ) 
(vT) ->ul [i] = v.ul [i] ; 

} 

void _vmrglw( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

VMX_reg v; 
ulong i, j; 

for ( i = 0; i < 2; i++ ) { 
j - i + i; 

v.ul[j] = (vA)->ul[ (2+i) ] ; 
v.ul[(j+l)] = (vB)->ul[(2+i)]; 

} 

for ( i = 0; i < 4; i++ ) 
(vT)->ul[i] = v.ul[i] ; 

} 

void _vmsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vC, VMX_reg *vB ) 
{ 

ulong i; 

for ( i = 0; i < 4; i++ ) 

(vT)->/[ij = ((vA)->f[i] * (vC)->f[i]) - (vB)->f[i]; 

} 

void _vnmsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vC, VMX_reg *vB ) 
{ 

ulong i; 

for ( i = 0; i < 4; i++ ) 

(vT)->f[i] - -( ( (vA)->f [i] * (vC)->f[i]) - ( vB) ->f [.i] ) ; 

} 

void _vslw( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

ulong i, sh; 

for ( i = 0; i < 4; i++ ) { 

sh = (vB)->ul[i] & (ulong) Oxlf; 
(vT)->ul[i] = (vA)->ul[i] « sh; 



v3| 



} 

} 



void _vspltisw{ VMX_reg *vT, long SIMM ) 
{ 

ulong i; 

for ( i = 0; i < 4; i++ ) 

(vT)->l[i] = (long) (SIMM) ; 

} 

void _vspltw( VMX_reg *vT, VMX_reg *vB, ulong UIMM ) 
{ 

ulong i, ul; 
. ul - (vB) ->ul[ (UIMM) & 0x3]; 
for ( i = 0; V< 4; i++ ) 
(vT)->ul[i] = ul; 

} 

void _vsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

ulong i; 

for ( i = 0; i < 4; i++ ) 

(vT)->f[i] = (vA)->f[i] - (vB)->f[i]; 

} 

void _vxor( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ) 
{ 

ulong i; 

for ( i - 0; i < 4; i++ ) 

(vT)->ul[i] - (vA)->ul[i] A (vB)->ul[ij; 

} 




/★★★★★★★★★★a*********************** ^ 

I* File Name: ppc_vmx.h *l 

I* Description: Header file for PPC vmx (altivec) emulation *| 

I* *l 

I* Mercury Computer Systems, Inc. *| 

I* Copyright (c) 1999 All rights reserved . *| 

I* ^ *l 

I* Revision Date Engineer; Reason *| 



I* 0.0 991119 jg; Created *| 

\***********************************^ 

#define uchar unsigned char 
#define ushort Unsigned short 
#define ulong unsigned -long 

/* 

* define a structure to represent a VMX (SIMD) register 
*/ 



typedef union { 



char 


c[16]; 


uchar 


uc[16] ; 


short 


s[8]; 


ushort 


us [8] ; 


long 


1[4]; 


ulong 


ul[4]; 


float 


f [4]; 


} VMX_reg; 





/* 

* condition register comprised of 8 4-bit fields (0-7) 
V 

extern long CR[]; 
/* 

* prototypes for functions that emulate vmx instructions 
V •, 

void _lvewx( VMX_reg *vT, ulong rA, ulong rB ); 

void _lvx( VMX_reg *vT, ulong rA, ulong rB ); 

void _stvewx ( VMX_reg *vS, ulong rA, ulong rB ); 

void _stvx( ^MX_reg *vS, ulong rA, ulong rB ); 

void _vaddfp'( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ); 

void _vmaddfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vC, VMX_reg *vB ); 

void. _vmrghw( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ); 

void _vmrglw( VMX_reg *vT, VMX_reg *vA, VMXjreg *vB ); 

void _vmsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vC, VMX_reg *vB ); 

void _vnmsubfp( VMX_reg *vT, VMX_reg *vA, VMX__reg +vC, VMXjreg *vB ); 

void _vslw( VMX_reg *vT, VMX_reg *vA, VMXjreg *vB ); 

void _vspltw( VMX_reg *vT, VMX_reg *vB, ulong UIMM ); 

void _vspltisw( VMX_reg *vT, long SIMM ); 

void _vsubfp( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ); 

void _vxor( VMX_reg *vT, VMX_reg *vA, VMX_reg *vB ); 

/* 

* vmx instuction macros 
*/ 
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#define LVEWX ( vT, rA, rB ) 
#define LVX( vT, rA, rB ) 
idefine STVEWX( vS, rA, rB ) 
#define STVX { vS, rA, rB ) 
#define VADDFP ( vT, vA, vB ) 
#define VMADDFP ( vT, vA, vC, vB ) 
#define VMRGHW ( vT, vA, vB ) 
#define VMRGLW ( vT, vA, vB ) 
#define VMSUBFP( vT, vA, vC, vB ) 
#define VNMSUBFP{ vT, vA, vC, vB ) 
#define VSLW( vT, vA, vB ) 
#define VSPLTW( vT, vB, UIMM ) 
#define VSPLTISW( vT, SIMM ) 
tdefine VSUBFP( vT, vA, vB ) 
#define VXOR( vT,/ vA, vB ) 



_lvewx( &vT, (ulong)rA, (ulong)rB ); 
_lvx( &vT, (ulong)rA, (ulong)rB ); 
_stvewx( &vS, (ulong)rA, (ulong)rB ); 
stvx( &vS, (ulong)rA, (ulong)rB ); 
_vaddfp( &vT, &vA, &vB ); 
_vmaddfp( &vT, &vA, &vC, &vB ); 
_vmrghw( &vT, &vA, &vB ); 
_vmrglw( &vT, &vA, &vB ); 
_vmsubfp( &vT, &vA, &vC, &vB ); 
vnmsubfpC &vT, &vA, &vC, &vB ); 
_vslw( &vT, &vA, &vB ); 
vspltw( &vT, &vB, UIMM ); 
vspltisw( &vT, SIMM ); 
vsubfp( &vT, &vA, &vB ); 
vxor( &vT, &vA, &vB ); 
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